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Abstract: Operating system abstractions do not always 
reach high enough for direct use by a language or 
applications designer. The gap is filled by language-specific 
runtime environments, which become more complex for 
richer languages (CommonLisp needs more than C++, 
which needs more than C). But language-specific 
environments inhibit integrated multi-lingual 
programming, and also make porting hard (for instance, 
because of operating system dependencies). To help solve 
these problems, we have built the Portable Common 
Runtime (PCR), a language-independent and _ operating- 
system-independent base for modern languages. PCR 
offers four interrelated facilities: storage management 
(including universal garbage collection), symbol binding 
(including static and dynamic linking and loading), threads 
(lightweight processes), and low-level I/O (including 
network sockets). PCR is "common" because these 
facilities simultaneously support programs in several 
languages. PCR supports C, Cedar, Scheme, and 
CommonLisp intercalling and runs pre-existing C and 
CommonLisp (Kyoto) binaries. PCR is "portable" because 
it uses only a small set of operating system features. The 
PCR source code is available for use by other researchers 
and developers. 
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1. Introduction 
1.1 The Problem - interlanguage interoperability 


Although there are many facets to interoperability, one 
remains largely unassailed: closely coupled interoperation 
between programs written in different languages. By 
closely coupled we mean that an application as real-time or 
sophisticated as a device driver or a database management 
system might have different parts written in different 
languages. The parts could share data structures, memory, 
and threads of control. Interoperation without giving one 
language a primary role is to be preferred: the choice of a 
language should be determined by the semantic model 
needed, not by the degree of support from the operating 
environment. 


The principal prerequisites to closely — coupled 
interoperation are the ability to share an address space, the 
ability to bind symbolic names between programs written in 
different languages, the ability to share I/O, and the ability 
to share data representations between programs written in 
different languages. In addition to these problems of 
interlanguage interoperation, intralanguage interoperation 
in the form of lightweight concurrent threads is an 
important concept of modern programming languages and 
this feature must be made interoperable as well, so that all 
programs can properly respect each others’ critical sections. 


1.2 The Portable Common Runtime solution 


The Portable Common Runtime addresses the shared 
address space, symbol binding, and lightweight threads 
requirements for interoperability. The problem of shared 
data representations is beyond the scope of PCR, although 
certainly important for achieving much of the benefit of 
interlanguage interoperability. 


Ordinarily, the language implementor produces a language- 
specific runtime layer directly on top of the operating 
system. In this scenario, features such as garbage collection 
must either be added at the operating system level or the 
language-specific runtime level. If the latter approach is 
chosen the feature becomes very difficult or impossible to 
use interoperably. However, an approach relying on a new 
operating system is a barrier to interoperability with 
existing applications and the existing operating systems 
they rely upon. 


The PCR approach is to produce a common runtime layer 
between the operating system and the language runtimes. 
The PCR abstractions complement typical operating system 
abstractions such as virtual memory, communications, and 
file system. PCR allows exploration of the kind of features 
future generations of operating systems will have in their 
kernels to support interoperability. 


Threads 


PCR 


Symbols 


PCR differs from language-specific runtimes both in the 
sophistication of some of its features, and the paucity of 
others. Compared with the C standard library, for instance, 
it offers the new features of threads, garbage collection and 
dynamic loading, but does not offer string functions or 
sophisticated printing or input scanning. Our choice is to 
focus deliberately on those features that language 
implementations must share for closely coupled 
interoperability, while avoiding other features in a runtime 
library that are not so important to interoperation. We 
assume that features we do not implement can continue to 
be done on a language-dependent basis without seriously 
reducing interoperation. For example, the string functions 
of the C library are adequate for manipulating C strings in 
PCR, while another library could be used for another 
language’s character strings. This is not true for garbage 
collection, say, or the process model for which sharing data 
or critical sections requires a common abstraction. 


PCR fails to solve the whole problem of language and 
application interoperation in at least two ways. AS 
mentioned above, PCR does not solve the interlanguage 
data representation problem. Other attacks on 
interoperation, such as remote procedure call and 
Presentation Manager [Apik and Diehl 1988], do impose a 
standard method of data exchange. Second, PCR says 
nothing about a user interface. Again, other approaches, 
like Open Look and Motif, address this [Hayes and Baran 
1989]. 


For additional alternatives to our approach, see section 5 of 
this paper on Related Work. 


2. PCR Design Principles 


The PCR design was constrained by the following 
principles: 


1. implement above existing operating systems. 
2. support existing simple applications. 


3. permit the use of existing compilers, libraries, and 
binaries. 


4. let sophisticated applications be written. 


Implementing above the operating system means, in the 
first place, avoiding changes to operating system kernels, 
and the second place, not duplicating operating system 
functions. Therefore, for instance, a PCR implementation 
on Mach [Accetta et al. 1986] maps PCR threads into Mach 
threads. However, implementing PCR well requires from 
its base operating system certain functions not available 
from every operating system: it requires the ability to share 
memory and open files between operating system 
processes; it requires the ability to protect pages of 
memory, and to catch and restart from protection failures; 
it requires a file system. These features are available more 
and more, and so we traded off loss of portability to older 
operating systems for much greater functionality. 


Supporting existing simple applications means that, as we 
add potentially interfering new features, older 
programming styles can mostly remain intact. For instance, 
although we garbage collect storage allocated by C code, we 
do not require that C programmers replace their ‘malloc’ 
and ’free’ calls. PCR simply ignores the ‘free’s, and 
invisibly collects ‘malloc’ed space. Binary files that can be 
dynamically loaded in PCR can also be statically linked 
using the vanilla Unix ‘Id’ command. PCR is not perfect in 
this respect, as the details in following sections make clear, 
but it achieves a useful compromise between backward 
compatibility and new functionality. 


By permitting the use of existing compilers, libraries, and 
binaries, we help to enforce on ourselves our rule of 
supporting existing simple applications. We co-exist with a 
machine’s native stack and calling conventions, so compiler 
back-ends do not have to change, and we accept standard 
relocatable object file format, so precompiled code 
continues to work. For example, the complete SunView 
window system library operates unrecompiled, dynamically 
loaded and garbage collected in PCR. One thing that does 
not work is dynamically loading binaries from which 
relocation information has been removed. 


By saying we want to permit sophisticated applications, we 
mean applications most naturally expressed using PCR- 
specific features. For instance, an application managing 
many concurrent activities will use the threads facilities. 
The language that has stretched our interfaces farthest has 
been CommonLisp, because it already has notions of 
dynamic loading and of garbage collection. We had to 
make sure we offered facilities on top of which 


CommonLisp language implementors could work. For 
example, an implementation using tagged pointers must be 
able to co-exist with our collector. In general, our solutions 
were of two types: make the interfaces more general, and 
provide for upcalls [Clark 1985] when there was no other 
way. 


3. Design and Implementation of PCR 
3.1 Threads 
3.1.1 Background 


The PCR threads interface offers the usual semantics of 
monitors, monitor locks, condition variables, fork/join, 
aborting, etc. [Hoare 1974, Brinch-Hansen 1975]. As 
indicated above, we have worked to make the interface 
general enough to be used cooperatively by many different 
languages. PCR threads meet the runtime requirements of 
languages such as Cedar/Mesa [Swinehart et al. 1986], 
Modula-3 [Cardelli et al. 1988], CommonLisp [Steele 1984], 
and ARGUS [Liskov et al. 1987]; and can easily simulate 
other threads packages such as Cooper’s C-Threads for 
Mach, Sun’s Iwp [Sun 1988a], Bershad’s [Bershad et al. 
1988], etc. The following overview highlights noteworthy 
features of our implementation. 


Threads implementations fall into two categories: inside or 
outside the OS kernel. Implementations inside the kernel, 
such as Mach [Accetta et al. 1986] or V [Cheriton and 
Zwaenepoel 1983], have explicit knowledge of multiple 
threads per address space, and the OS scheduler treats such 
threads separately. Implementations outside the kernel 
generally use coroutines in a single heavyweight process. 
Coroutine implementations can be faster at thread 
switching, because they avoid any overhead associated with 
entering and leaving the kernel (similar to the speedup 
achieved by {Synthesis} [Pu et al. 1988], although via a 
different method). However, their reliance on only one 
heavyweight process introduces two serious problems: first, 
if that process ever blocks, all threads are blocked and 
second, there is no opportunity to use a multiprocessor to 
achieve true concurrent thread execution. Techniques for 
avoiding blocking--use of the Unix non-blocking I/O 
primitives, for example--can alleviate this problem, but they 
cannot entirely eliminate it, since some kinds of blocking 
(e.g. page faults) cannot be predicted or even detected 
outside the kernel. 


3.1.2 Implementation 


Our approach to implementing PCR threads in an 
operating system like Unix, which has no notion of a 
lightweight process, is to have a small number of 
heavyweight processes act as a pool of "virtual processors" 
("VPs") to execute the many threads. All VPs share a 
common address space. Each VP is treated by the PCR 
scheduler exactly like a cpu in a _ shared-memory 


multiprocessor system. This implementation avoids the 
problems of coroutine threads implementations while not 
suffering the cost of a kernel entry for every thread switch. 


In the normal case of a thread blocking predictably (e.g. by 
waiting on a monitor lock or condition) or being preempted 
at a timeslice, scheduling a new thread under this scheme is 
essentially a coroutine jump within a single VP. Non- 
blocking I/O and other techniques are used to make most 
instances of thread blocking predictable, and thus avoid 
most instances of VP blocking. Occasionally, however, a 
thread blocks unpredictably, say for a page fault or file 
system I/O. In that case the VP running the thread blocks; 
but the remaining VPs are still available for heavyweight 
process scheduling by the OS, and continue to run other 
threads. On a uniprocessor, assuming the number of 
available VPs exceeds the number of unpredictably blocked 
threads, the net effect is just to trade a heavyweight process 
switch (between VPs) for a lightweight switch (between 
threads in a single VP); some threads continue to make 
progress at all times. On a _ multiprocessor, ready threads 
execute concurrently (depending only on a reasonable base 
kernel implementation) with no change to the PCR 
implementation. 


The PCR implementation relies on a relatively small 
number of underlying kernel features, chief of which is the 
ability to share memory among heavyweight processes. 
Since this feature exists in OS/2, the Unix SVID, Mach, 
SunOS, and many other modern operating systems, we 
anticipate no serious portability problem. Other OS 
features required by PCR are the ability for heavyweight 
processes to interrupt one another and to catch interrupts, 
and the ability to define a medium-grained interval timer 
(our scheduler wakes up ten times a second for time- 
slicing). Our implementation runs better if it can also 
write-protect pages (used for stack red-zoning and parallel 
garbage collection), catch and restart from protection 
violations, and remap pages to different addresses. 


3.1.3 Debugging 


Debugging of threads is currently a bit difficult, and we are 
working to improve it. At present, there are a few 
interactive commands by which one can stop all VPs, run 
on a single VP, freeze or thaw individual threads, or 
examine an individual thread. Examining works like this: 
before examining, a single-process debugger (say Unix dbx) 
is pointed at a distinguished VP, and a breakpoint is set at a 
well-known location. When the examine command is given 
for a thread, the thread is scheduled on that VP, and forced 
to execute through the breakpoint location. The specified 
VP hits the breakpoint with the desired thread’s stack 
appearing as the VP process stack. 


Independently, we are developing an extensible, multi- 
language debugger for PCR [Sturgis 1989]. This debugger, 
called Cirio, handles operations such as starting and 
stopping threads and setting breakpoints in a uniform, 


language-independent way. Each supported language 
registers with Cirio a collection of objects that embody 
language-specific features such as symbol _ table 
interpretation, data type representation and stack frame 
layout. A preliminary, cross-machine version of Cirio is 
working for Cedar and C programs. 


3.2/0 


The I/O interface currently provided by PCR is a nearly- 
exact emulation of the Unix I/O system calls. This is 
certainly the least portable aspect of the PCR design, and 
we plan eventually to replace it. However, developing the 
ultimate general-purpose, powerful and fully portable I/O 
interface will involve substantial research and effort; the 
current design was simple to produce (we copied it) and has 
enabled us to write PCR-based applications and validate 
some implementation techniques. 


One limitation of Unix (and some other systems as well) is 
particularly troublesome when combined’ with the 
implementation of threads described above: the maximum 
number of open files that a single heavyweight process can 
hold is much less than the total number of open files 
supported by the system. In "normal" use of Unix, with 
each heavyweight process running a single application, the 
open file limit is large enough to be uninteresting. But we 
want to implement network servers and other large systems 
using PCR; it is important that the per-heavyweight-process 
resource limitations of Unix not translate into system-wide 
resource limitations for PCR. 


To deal with this problem, our implementation on top of 
Unix uses additional heavyweight processes as "I/O 
processors"("IOPs"), essentially to serve as caretakers for 
file descriptors. It works as follows: 


A file is opened by allocating a file descriptor slot in one of 
the IOPs and sending a message to that IOP asking it to 
open the file. While the file remains open, its descriptor 
remains in the IOP; the descriptor slots of the VPs are 
treated as an LRU cache of copied descriptors. To perform 
I/O on a descriptor, a thread first ensures that a copy of 
that descriptor exists in the VP’s descriptor cache. If 
necessary, the least recently used descriptor in the cache is 
replaced by a copy of the desired descriptor, which is 
transferred from the corresponding IOP using Unix- 
domain IPC or using stream operations on a specially 
written pseudo-device driver. Currently, all VPs maintain 
identical file descriptor caches, though this constraint could 
be relaxed at the cost of some complexity in the 
implementation. The thread then attempts a non-blocking 
I/O operation on the descriptor. If the operation fails 
because it would block, the thread sends a message to the 
IOP requesting notification when the descriptor becomes 
ready. It then waits on a condition variable, allowing the 
VP to schedule a different thread without blocking. 
Eventually the descriptor becomes ready and the IOP 
notifies the waiting thread, which wakes up and retries the 


I/O operation. This scheme works well under the obvious 
condition that the working set of descriptors fits in the VP’s 
descriptor cache. 


User code sees none of this, of course. PCR imposes a layer 
of indirection in the file descriptors, and mimics all the 
Unix I/O system call layer (read, write, open, ...). It does 
the same for the SunOS socket-oriented calls and the 
System V stream-oriented calls. 


This I/O design enables us to support applications 
requiring more open files than allowed in a single Unix 
heavyweight process, at the cost of occasionally having to 
fault copies of descriptors into the VP descriptor caches. 


3.3 Storage Management 
3.3.1 Background 


Storage management for many modern languages requires 
garbage collection. If programs are to make the most of a 
shared address space, it must be possible for them to share 
allocated data structures. This implies that storage 
allocation and garbage collection must be part of the 
common runtime rather than the individual language 
runtimes. An additional benefit of including storage 
management with garbage collection in the common 
runtime is that programmers in languages like C, which do 
not require garbage collection in their runtime, benefit 
from its inclusion. 


In order to work for languages that cannot guarantee 
pointer locations, the Portable Common Runtime uses a 
conservative collection scheme as implemented by Boehm 
[Boehm and Weiser 1988]. Two different storage allocation 
systems have been implemented for PCR. The first is a 
direct adaptation of Boehm’s Russell collector, with 
additions for typed objects and finalization. The second is a 
new implementation that is real-time, parallel, generational 
but noncopying, and handles pointers to the interior of 
objects. Because of its unique features, this second 
implementation is described in more detail in a separate 
paper [Demers et al. 1989]. Here we focus on the highlights 
common to both collectors, and in particular on the 
mechanisms common to both for finalizing objects in a 
conservative world, and for pointer-finding upcalls. 


Garbage collectors can be either reference counting or 
mark-and-sweep. Reference counting collectors impose 
overhead on each pointer manipulation; mark-and-sweep 
collectors defer the overhead to garbage collection time. 
Conservative collectors [Bartlett 1988] are a new type of 
mark-and-sweep collector. The advantage of conservative 
collectors is that they require no support from language 
implementations. Even without exact knowledge of which 
words in memory are pointers a conservative collector will 
never identify a reachable object as garbage; however, it 
may err in the other direction. 


As Bartlett and Boehm have shown, conservative or 


partially conservative collectors work for many languages. 
For PCR they have been extended in two ways: finalization 
and pointer finding upcalls. 


3.3.2 Finalization 


Finalization is the method by which an application can 
request that it get a chance to look at an object just before it 
is freed. The application can abort the free at that point, or 
let it continue. In PCR, finalization works as follows: 
finalization may be requested for any object by passing it to 
the PCR routine XR_ RegisterForFinalization. 
XR RegisterForFinalization returns a handle to 
the object. The handle may be turned into a true pointer to 
the object at any time, but is not a pointer for purposes of 
collection (i.e. an object can be finalized while handles for it 
still exist). 


During each collection, after the mark phase but before 
sweeping, the PCR collector executes the algorithm below: 


for each unmarked finalizable object o 
for each pointer p ino 
mark p, and mark all p“’s descendants 
for each finalizable object, o, still unmarked 
queue o for finalization by its owning application 


This algorithm has the difficulty of never finalizing circular 
lists. The circularity of such lists must be broken by using a 
handle produced by XR _ RegisterForFinalization 
for one of the links, rather than an actual pointer. The 
handle isn’t treated as a pointer by the collector, so 
finalization still occurs. 


An alternative implementation of finalization, used in the 
Cedar reference counted storage system[Rovner 1985], is a 
dangerous technique called package ref counts. It involves 
artificially lowering the reference count stored in each 
finalizable object. and having no way to tell a known from 
an unknown reference. We believe that our method, using 
explicit handles that can be tured into pointers, is safer 
and less error prone. 


Finalization is tricky, however it is done, but it is not 
frequently programmed directly. For instance, in the two 
million lines of Cedar code in use at PARC, only twelve 
calls register objects for finalization. Requiring careful 
programming in the use of this feature is therefore 
reasonable. However, doing without finalization is not 
possible: the twelve kinds of finalizable objects in Cedar 
include stream and network I/O objects, so nearly all 
applications indirectly use finalization. 


3.3.3 Improving performance of the conservative collector 


The PCR collector is conservative and so works even for 
languages that permit any word in memory to contain a 
pointer (such as C). However, for some languages (such as 
Cedar and Lisp) it is possible to tell exactly which bit 
patterns in memory are pointers. For these languages, 


collector speed can be improved because only words 
containing pointers need to be examined. Collection 
precision may also be improved because false pointers will 
not unnecessarily hold storage under the conservative 
assumption, although for at least one test this effect was 
small [Boehm and Weiser 1988]. Another reason for non- 
conservative pointer-finding is that some language 
implmentations use non-standard pointer representations to 
improve non-pointer performance (e.g. tagged pointers in 
some Lisp implementations). For all these reasons, the 
PCR design incorporates the notion of a pointer-finding 
upcall. The pointer-finding upcall works as follows: 


Each object is typed by the kind of pointer-finding upcall 
needed to deduce its pointers. There can be as many 
different upcalls as needed to find pointers in objects: one 
for tagged pointers, one finding pointers according to 
datatype-dependent pointer maps, another for entirely 
conservative pointer-finding, etc. New upcalls are 
introduced by registering them with the collector and 
receiving in return an upcall type code. The upcall type 
code is an optional additional parameter to object creation 
and is permanently associated with the created object. 
During the mark phase of collection, the collector uses the 
upcall associated with each object to find the pointers it 
contains. In the absence of an upcall, fully conservative 
pointer-finding is used. 


Our largest PCR applications do not use the upcall at 
all--they run completely conservative, even though in the 
case of Cedar we _ theoretically have enough type 
information to be more precise. Our experience with the 
upcall is in a special version of PCR that uses the upcall for 
compatibility with a CommonLisp implementation _ that 
uses tagged pointers. The cost of the upcall is about a 
microsecond per object on a 16 Mhz SPARC (sun-4/260). 
This is roughly twice the cost of conservatively examining a 
word in the object to see if it is a pointer (which requires at 
least a range check to see if it could be a value in the heap). 
Thus the upcall performance is better than the conservative 
performance if it can reject non-pointers twice as fast as the 
conservative check, with at least a constant improvement of 
two pointer checks. For instance, suppose a rate of pointers 
in objects of 25%. Then an upcall that positively identified 
pointers in objects, and spent an average of three 
conservative checktimes per pointer per object doing so, 
would on average have better performance for all objects of 
size greater than 5 words (cost of 2 word-times overhead for 
the upcall, and cost of 3.75 word-times for the 1.25 expected 
pointers per 5 word-object). 


3.4 Symbol Binding and Incremental Loading 


Finally, we come to the part of the PCR that does 
incremental linking and loading, maintaining the symbol 
tables used for this purpose and for debugging. This part of 
the PCR consists of two components. The first internalizes 
object code from external files and the second does 


symbolic name binding on the internalized code. 


Internalizing object code is, of course, dependent on the 
particular file formats used for object code. The 
incremental loader reads the code and data portions of the 
object file into dynamically allocated storage. It creates an 
internal symbol table corresponding to the symbol table in 
the file. It also relocates the loaded code and data and 
records the location of the code and data in a file for later 
use by the debugger. This file is used to generate a 
synthetic a.out file containing symbols for all the statically 
and dynamically loaded code. Debuggers such as dbx and 
gdb can then be used on dynamically loaded code by using 
the a.out file as their symbol source. 


The second component manages the internalized symbol 
tables. It is responsible for attempting to resolve undefined 
symbols in each incrementally loaded file against defining 
occurrences in previously loaded files and in libraries. Of 
course, finding a defining occurrence in a library causes the 
appropriate library component to itself be incrementally 
loaded. 


Each incrementally loaded module is checked for two 
special names: XR install and XR run. If present, 
these are called in that order. XR install performs any 
language-dependent symbol binding (Cedar and Lisp use it, 
for instance). XR run is the entry point to actually start 
executing the loaded code. 


The incremental loading code is fully compatible with 
existing Unix programs and libraries. Anything that can be 
dynamically loaded can also be statically bound into an 
instance of PCR. This enables us to debug PCR-based 
applications using dynamic loading, and then, using those 
same modules, easily construct a single executable program 
indistinguishable from any other executable binary on the 
machine. We use this, for instance, to make some of our 
common tools, like the Cedar compiler and Postscript and 
Interpress decomposers, look like ordinary Unix programs. 


4. Performance 


Our early experience with PCR as the foundation of our 
Cedar programming environment suggested that its 
performance was not an issue for that use. To quantify this 
feeling we undertook some measurements of PCR 
performance. 


This section is broken into 4 parts corresponding to overall 
performance as a user at the keyboard might see it, 
followed by sections detailing the performance of the PCR 
components. Times were measured on a Sun-4/260 
running SunOS 4.0.1 with the SunView window system up 
but idle, and no other activity besides the benchmark. 


4.1 Overall System Performance 


In this section, all times are averages over several runs of 


combined system and user time as measured by the SunOS 
getrusage call, reported either directly (for PCR), or from 
the the cshell ’time’ command. 


Running a tight loop counting to 30 million takes 31.3 
seconds in raw Unix, 31.7 seconds in PCR. This 1% 
overhead is accounted for by PCR’s internal 20-times-per- 
second clock interrupts, and its 10-times-per-second 
preemptive rescheduler. This measurement shows the 
penalty due to PCR for compute-bound jobs. 


As another measurement of PCR overhead, we ran the 
same tight counting loop many times at once, comparing 
multiprogramming using Unix processes with using PCR 
threads. For this measurement the number of times 
through the loop was in each case divided by the number of 
simultaneous executions, so an ideal completion time would 
have been the same as for the single tight counting loop, 
above. The details are in Table 1. This implementation of 
Unix seems to have a process switch overhead that goes up 
by 3% from 16 to 32 processes, and again from 32 to 64 
processes. These measurements show the benefit of 
lightweight threads over Unix processes for supporting 
many compute bound activities simultaneously. 


number of processes 
1 2 16 32 64 
UNIX,secs 31.3 328 347 355 368 
PCR, secs 31.7 33.2 339 340 34.0 


Table 1. Cpu-bound multiprocess times. 


Finally, as a realistic example, native unix troff takes 27.7 
seconds to process the Unix C-shell (CSH.1) manual entry. 
With troff dynamically loaded and linked into PCR, the 
same computation takes 28.0 seconds, exclusive of link and 
load time. This is the minimum overhead of about 1%. 


4.2 Threads 
4.2.1 Thread Switch Time 


Currently, the cost of a thread switch is about 77 usec 
(measured by JITIMER PROF option of the getitimer 
call). This includes a 35 usec overhead of a trap to the 
SunOS kernel to save the register-window. It also includes 
overhead for the debugging features discussed in Section 
3.1.3: the ability to freeze and thaw individual threads, to 
examine blocked or faulted threads, to switch dynamically 
between single and multiple virtual processors, etc. Some 
of these features will be greatly simplified or eliminated 
when the Cirio debugger is completed, and thread switch 
times should improve as a result, but even the current value 
would be acceptable. (There is an additional, hidden cost 
to thread switching, not reflected in our measurements: 
when a thread is dispatched, its register file is initially 
empty, so the thread incurs the overhead of "faulting-in" 
registers as necessary.) 


4.2.2 Monitor Entry and Exit Times 


An important performance parameter is the cost of 
acquiring and releasing a monitor lock in the (usual) case 
that there is no contention for the lock. In PCR the cost of 
calling a null ENTRY procedure (i.e., one that does nothing 
but acquire and then release a monitor lock) is 4.2 usec. 
Currently both locking and unlocking are done by 
procedure call, so this cost could be improved slightly by 
in-line expansion. Additional improvement could be 
achieved, at some expense in debuggability, by eliminating 
a field containing the identity of the thread holding a lock. 
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A program that opens a file, and then does 100,000 
iterations of Iseeking to the beginning and reading 1024 
bytes takes 23.3 seconds in raw unix, and 31.3 seconds in 
PCR. This is an overhead of about 40 usec for each of the 
Iseek and read calls. Some of this time is in entering and 
leaving monitors (two per call), and the rest is the overhead 
of checking for the special cases for such things as the read 
blocking or the descriptor not being in the cache. A better 
tuning for common cases would make a large improvement. 
On the other hand, 40 usecs overhead for an I/O that will 
take at least a few milliseconds seems acceptable. 


We do not yet have applications large enough to let us 
make realistic measurements of the PCR file descriptor 
cache hit ratio. Using a synthetic benchmark, we have 
measured the cost of a descriptor cache miss to be about 4.6 
msec. This figure includes system and user times, both on 
the virtual processor on which the fault occurred and on the 
I/O processor owning the desired descriptor. 


4.4 Storage management 


Our most widely used PCR implementation today (March 
1989) uses a collector based on Boehm’s [Boehm and 


Weiser 1988], modified to keep a type word before the first 
word of the object. Its collection speed is about a half 
second per megabyte of active object space (assuming no 
paging). Our newer collector ought to run at a similar 
speed, but on only 100kbytes at a time and touching many 
fewer pages [Demers et al 1989]. 


4.5 Symbol binding and dynamic loading 


Dynamically loading an object file into PCR is somewhat 
faster than processing the file using the Unix ld command. 
For example, the troff program discussed above is loaded 
into PCR in 1.3 seconds compared with the 1.7 seconds ld 
requires to process it. The reason seems to be that Id has 
many general cases to handle, and must also build an 
output file, while PCR loads and relocates in place. Of 
course, the cost of Id’ing is amortized over all the executions 
of the resulting output file, while the cost of dynamic 
loading can only be amortized over executions taking place 
within a single instance of PCR. 


5. Related Work 


Our work builds on previous research in light-weight 
processes, garbage collection, library management, etc., and 
references to these are in the main body of the text. In this 
section we collect the discussion about alternative 
approaches to language interoperability. 


One current approach to language _ interoperability, 
exemplified in Mercury [Liskov et al. 1988] and HRPC 
[Bershad et al. 1987], uses client-server models of 
interoperation where remote procedure call connects, and 
insulates, applications in different languages. The problem 
here is the lack of tight coupling. Remote procedure call, 
even when local but across address spaces, is usually much 
more expensive than calls within the same address space. 
When the language partitioning and the client/server 
partitioning match, RPC does well. When they do not 
match, they force the application writer to introduce 
artificial distinctions. 


Another approach to language interoperability uses a 
common base language to which other languages must 
conform. The foreign function call interfaces in Common 
Lisp [Sun 1988b, Franz 1988], are examples of this 
approach. The problem here is that the privileged language 
enjoys easier debugging, better access to services, and more 
attention from developers. The choice of language in which 
to write an application becomes distorted by issues beyond 
appropriate language semantics, and the languages 
interoperate asymmetrically. 


A third approach to language interoperability is to 
standardize on a common intermediate form. This is a 
variation on the privileged language approach, permitting 
different languages to interoperate as long as they use a 
common compiler back-end. In spite of several attempts in 


this direction [e.g. Tanenbaum et al. 1983], the restrictions 
on language designers and implementors have proven too 
severe for wide adoption. We hope our more modest 
approach (agreement on important parts of the runtime 
environment), by analogy with the success of common 
operating systems, will prove better in practice. 


A fourth "approach" is to say that language interoperability 
is bunk. Either there is one true language, or what really 
matters is not language but environment. Proponents of 
these views either focus efforts on inventing new languages 
to solve all their problems [U.S. DOD 1983] or in 
developing the single language environment as in Smalltalk 
[Goldberg and Robson 1983], Interlisp [Xerox 1985], Cedar 
[Swinehart et al. 1986], or the new DARPA environments 
proposal [Gabriel 1989]. We think different problems are 
attacked better in different languages, and that software 
engineers and computer scientists should not be restricted 
to a single semantic arrow in their quivers. 


6. Experience and Conclusions 


The Portable Common Runtime is in daily use by about ten 
researchers at PARC and by scores of users elsewhere in 
Xerox. PCR is the lowest-level foundation of future work 
in PARC’s Computer Science and Electronic Documents 
Labs. We are running about 500,000 lines of Cedar code on 
top of PCR as of August 1989 [Atkinson et al. 1989]. PCR, 
itself, is about 20,000 lines of C, and about 200 lines of 
assembler. 


Several of our uses push hard on the PCR facilities. For 
instance, we have an X window client that creates several 
threads per window. We also have an Interpress printer 
driver that reuses lots of free storage, and so stretches the 
collection facilities. To bring up a full Cedar world on our 
Sun workstations, more than 60 large modules (totaling 
over 5 megabytes) must be dynamically loaded. 


We routinely use PCR to intercall between C and Cedar, 
and intercalling with Kyoto Common Lisp receives a small 
amount of use. A small part of the Kyoto runtime was 
changed to use the PCR collector and dynamic loader; 
otherwise it is unchanged. We have tested the dynamic 
loading and automatic garbage collection of large pre- 
existing SunView applications, merely relinked to be 
relocatable instead of executable, and they run fine. 


Most of our use of PCR is under SunOS 4.0 on SPARC- 
based processors. We also have a small amount of 68020 
and Mach use. The use that shows PCR’s portability best is 
on CSL’s own SPARC-based computer, which has no 
operating system at all but shares memory with another 
processor running the Cedar operating system. This PCR 
has nothing Unix-like nearby to rely upon. Bringing up 
this PCR from our original SunOS SPARC-based version 
took less than a month. 


For the future, we hope to see PCR in wider use, both 
inside and outside PARC and Xerox. The source code and 
documentation for PCR is available from the Computer 
Science Laboratory at PARC for a copying charge. We 
hope to interest other portable language efforts, such as 
C++, Objective C, and Modula-3, in using PCR as their 
base. We hope to see at least the facilities offered by 
PCR threads, language-independent garbage collection, 
and user-controlled incremental linking and _ loading 
available in all future operating systems. Finally, we expect 
to continue push on the enabling technologies for language 
interoperation. Of particularly critical importance is further 
work on data representation. 
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